智能论文笔记

Understanding Political Polarisation using Language Models: A dataset and method

Samiran Gode , Supreeth Bare , Bhiksha Raj , Hyungon Yoo

分类：自然语言处理

2023-01-02

Our paper aims to analyze political polarization in US political system using Language Models, and thereby help candidates make an informed decision. The availability of this information will help voters understand their candidates views on the economy, healthcare, education and other social issues. Our main contributions are a dataset extracted from Wikipedia that spans the past 120 years and a Language model based method that helps analyze how polarized a candidate is. Our data is divided into 2 parts, background information and political information about a candidate, since our hypothesis is that the political views of a candidate should be based on reason and be independent of factors such as birthplace, alma mater, etc. We further split this data into 4 phases chronologically, to help understand if and how the polarization amongst candidates changes. This data has been cleaned to remove biases. To understand the polarization we begin by showing results from some classical language models in Word2Vec and Doc2Vec. And then use more powerful techniques like the Longformer, a transformer based encoder, to assimilate more information and find the nearest neighbors of each candidate based on their political view and their background.

translated by 谷歌翻译

The problem of generating an optimal coalition structure for a given coalition game of rational agents is to find a partition that maximizes their social welfare and is known to be NP-hard. This paper proposes GCS-Q, a novel quantum-supported solution for Induced Subgraph Games (ISGs) in coalition structure generation. GCS-Q starts by considering the grand coalition as initial coalition structure and proceeds by iteratively splitting the coalitions into two nonempty subsets to obtain a coalition structure with a higher coalition value. In particular, given an $n$-agent ISG, the GCS-Q solves the optimal split problem $\mathcal{O} (n)$ times using a quantum annealing device, exploring $\mathcal{O}(2^n)$ partitions at each step. We show that GCS-Q outperforms the currently best classical solvers with its runtime in the order of $n^2$ and an expected worst-case approximation ratio of $93\%$ on standard benchmark datasets.

translated by 谷歌翻译

Many studies have examined the shortcomings of word error rate (WER) as an evaluation metric for automatic speech recognition (ASR) systems, particularly when used for spoken language understanding tasks such as intent recognition and dialogue systems. In this paper, we propose Hybrid-SD (H_SD), a new hybrid evaluation metric for ASR systems that takes into account both semantic correctness and error rate. To generate sentence dissimilarity scores (SD), we built a fast and lightweight SNanoBERT model using distillation techniques. Our experiments show that the SNanoBERT model is 25.9x smaller and 38.8x faster than SRoBERTa while achieving comparable results on well-known benchmarks. Hence, making it suitable for deploying with ASR models on edge devices. We also show that H_SD correlates more strongly with downstream tasks such as intent recognition and named-entity recognition (NER).

translated by 谷歌翻译

当前，随机化是用于机器人技术中数据驱动的学习算法的SIM2REAL传输中广泛使用的方法。尽管如此，大多数SIM2REAL研究报告了特定随机技术的结果，并且通常是在高度定制的机器人系统上，因此很难系统地评估不同的随机方法。为了解决这个问题，我们为机器人触及余量操纵器任务定义了易于制作的实验设置，该设置可以作为比较的基准。我们将四个随机策略与模拟和真实机器人中的三个随机参数进行比较。我们的结果表明，更多的随机化有助于SIM2REAL转移，但它也可能损害算法在模拟中找到良好策略的能力。完全随机的仿真和微调显示出差异化的结果，并且比测试的其他方法更好地转化为实际机器人。

translated by 谷歌翻译